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Abstract 

We give an approximation algorithm for packing and 
covering linear programs (linear programs with non- 
negative coefficients). Given a constraint matrix with n 
non-zeros, r rows, and c columns, the algorithm (with high 
probability) computes feasible primal and dual solutions 
whose costs are within a factor ofl + s of OPT ( the optimal 
cost) in time 0(n + (r + c) log(n)/£ 2 ). 

For dense problems (with r,c — 0(^/n)) the time is 
0(n + Y / nlog(n)/e 2 ) — linear even as e — > 0. In compar- 
ison, previous Lagrangian-relaxation algorithms generally 
take at least f2(nlog(n) / e 2 ) time, while (for small e) the 
Simplex algorithm typically takes at least Vt(n min(r, c)) 
time. 



1. Introduction 

A packing problem is a linear program of the form 
max{a • x : Mx < b, x £ P}, where the entries of the con- 
straint matrix M are non-negative and P is a convex poly- 
tope admitting some form of optimization oracle. A cover- 
ing problem is of the form min{a • x : Mx > b,x € P}. 

Here we focus on the explicitly given forms, that is, 
max{a • x : Mx < b, x > 0} and min{a • x : Mi > 
b, x > 0}, when the polytope P is the positive orthant. Ex- 
plicitly given packing and covering are important special 
cases of linear programming, including, for example, frac- 
tional set cover, multicommodity flow problems with given 
paths, and variants of these problems. 

We give a (1 ± e)-approximation algorithm — that is, 
an algorithm that returns feasible primal and dual solutions 
whose costs are within a given factor 1 ± e of OPT. The 
algorithm is inherently randomized. With high probability, 
it runs in time 0(n + (r + c) log(n)/e 2 ), where n is the 
number of non-zero entries in the constraint matrix and r+c 
is the number of rows plus columns (i.e., constraints plus 
variables). 

For even moderately dense problems, r and c are o(n) 
and the 1/e 2 term in the running time is multiplied by only 



a sub-linear term. 

The run time is linear if e > f2(y^r + c) log(n)/n). 

In experiments reported here, our first implementation of 
the algorithm typically requires at most 12 (r+c) ln(n)/e 2 + 
0(n) basic operations on dense instances. For comparison, 
the GLPK (Gnu Linear Programming Kit) Simplex algo- 
rithm typically requires at a minimum about 5 min(r, c)rc 
basic operation (and often many more) to compute a near- 
optimal solution. (Note: more sophisticated Simplex 
implementations that maintain sparsity may require only 
0(min(r, c)n) operations.) For example, when e = 0.01, 
the algorithm is faster than GLPK Simplex by factors of 10- 
100 for problems with several thousand rows and columns. 
The speedup grows roughly linearly with rc. 

Our first implementation is relatively simple (fewer than 
a thousand lines of C++, mostly devoted to a fast random- 
sampling data structure). In contrast to Simplex, the al- 
gorithm requires no special techniques to maintain spar- 
sity of the constraint matrix, to deal with numerically ill- 
conditioned matrices, or to deal with basis cycling. 

Related work on Lagrangian-relaxation algorithms. 

The algorithm is a Lagrangian-relaxation algorithm. There 
is a large literature on Lagrangian-relaxation algorithms. 
Bienstock gives an implementation-oriented, operations- 
research perspective O. Arora et al. discuss them from a 
computer-science perspective, highlighting connections to 
other fields such as learning theory [2|. Todd places them 
in the larger context of linear programming in general in his 
overview IT3ll . 

For explicitly given packing and covering, the fastest 
previous Lagrangian-relaxation algorithm that we know of 
runs in time 0((r + c)c\og(n)/e 2 ), where c is the max- 
imum number of columns in which any variable appears 
lfT31 . That algorithm works for mixed packing and covering 
— a more general problem. One can improve that algorithm 
to run in time 0(n log(n)/e 2 ) (an unpublished result), but 
this is still impractically slow for small s and large n. 

For U = maxy My / '(bidj), Grigoriadis and Khachiyan 
give an 0((r + c) log(n)(f/ OPT) 2 /e 2 )-time algorithm Q. 
A pre-processing step ifTTl §2.1] can ensure that this is at 



most 0((r + c) log(n)(min(r, c)/e) 4 ). Although imprac- 
tical for general packing and covering problems, the algo- 
rithm is interesting in the following special case: given a 
two-player zero-sum matrix game with payoffs in [—1,1], 
one wants to find mixed strategies that guarantee an ex- 
pected payoff within an additive e + of optimal. For this 
case the time is 0((r + c) log(n)/e?_), which can be sub- 
linear in the input size. Their result uses an unusual tech- 
nique of coupling primal and dual algorithms, a technique 
that is central to our algorithm also. 

Tradeoffs in the dependence on 1 je. Recent works re- 
duce the dependence on e to 0(1/ e) for some packing 
and covering problems. Bienstock and Iyengar give an al- 
gorithm for concurrent multicommodity flow that solves 
0*(e~ 1 k 1 - 5 \V\°' 5 ) shortest-path problems, where k is the 
number of commodities and \V\ is the number of vertices 
|]4) . Chudak and Eleuterio continue this direction. For ex- 
ample, they give an algorithm for fractional set cover run- 
ning in time 0*(c 15 (r + c)/e + c 2 r) 0. This direction is 
motivated by the observation that in practice, even for mod- 
erately small e, the 1/e 2 factor in the running time makes 
previous algorithms impractically slow for large n. How- 
ever, the decreased dependence on e comes at the cost of 
increasing the dependence on other parameters. 

Note that the approximation parameter e plays a dif- 
ferent role than the problem-size parameters — as com- 
puting power and problem sizes grow, the case when e is 
a small constant (on the order of a fraction of a percent, 
say) will still be of interest. More concretely, compare the 
0*(c 15 (r + c)/e) term in the running time of Chudak et 
al.'s algorithm to the 0*((r + c)/e 2 ) term in the running 
time of the algorithm here. The former term is smaller only 
for e < 1/c 15 , but for e this small both algorithms take at 
least 0*(c 3 (r + c)) time — slower than alternatives such as 
Simplex. Arguably, neither algorithm is of practical interest 
for e so small. 

In sum, both theoretically and practically a main case 
of interest is when 1/e is a moderately large constant (a 
hundred to a thousand), while the number of rows and 
columns still grows asymptotically (beyond several thou- 
sand or more). In this case we prefer an 0(l/e 2 ) term in 
the running time to, say, an 0(c/e) term. An important 
goal is to design algorithms that are practical for e in this 
range. 

Technical approach. The following lower bound shows 
that some dependence on 1/e 2 is necessary [9|. With 
high probability, a packing problem with random matrix 
M £ {0, 1} C xc has no (1 — e)- approximate primal solu- 
tion with o(log(n)/e 2 ) non-zero entries. (This lower bound 
only holds when e is not too small — roughly £l(y/c) — 
which is why it does not preclude algorithms with o(l/e 2 ) 



dependence on e but larger dependence on other parame- 
ters.) 

The lower bound implies that to build a (1 — e)- 
approximate solution requires at least f2(log(n)/e 2 ) vari- 
ables to be incremented (set to a non-zero value). Most al- 
gorithms take at least fl(n) amortized time per increment, 
leading to total time at least f2(n log(n) /e 2 ). From this per- 
spective, the challenge is to reduce the time per increment 
as much as possible. 

To do this we use the following main ideas. We start 
with a variant of Grigoriadis and Khachiyan's algorithm [7 1, 
which is essentially as follows. It starts with all-zero primal 
and dual solutions. In each iteration, one coordinate Xj of 
the primal solution is increased by 1, where j is randomly 
chosen from a distribution p that depends on M T x, where x 
is the current dual solution. Likewise, a coordinate Xi of the 
dual solution is increased by 1, where i is chosen at random 
from a distribution p that depends on M x, where x is the 
current primal solution. 

(This algorithm can be interpreted as a form of fictitious 
play of a two-player zero-sum game, where in each round 
each player plays from a distribution concentrated around 
the best response to the aggregate of the opponent's histori- 
cal plays.) 

This random choice of increments can be implemented 
more efficiently than the more common approach of finding 
"optimal" increments. The reason is that the latter approach 
requires choosing an increment Ax to the primal solution to 
minimize p T M Ax, and separately an increment Ax to the 
dual solution to maximize p T M T Ax. The difficulty with this 
is that it requires maintaining the additional vectors p T M 
and p T M T instead of just p and p. (A change in one entry of 
x changes many entries in p, but even more entries in p T M. 
We are able to bound the total number of changes to entries 
in p by 0(r log(n) /e 2 ), but not so for changes to entries in 
p T M.) 

This basic ("slow") algorithm is analyzed in Lemma Q] 
To speed it up (and generalize it), we incorporate Garg and 
Konemann's non-uniform-increments amortization scheme 
[|6l . In each iteration, we make the algorithm increment 
the primal and dual variables not by 1, but by an amount 
just large enough so that some left-hand side (LHS) Mix or 
MjX increases by f2(l). This is small enough to allow the 
correctness proof to go through, but large enough to guar- 
antee progress. 

Finally, instead of maintaining Mx and M T x exactly, we 
maintain them approximately with a careful random sam- 
pling. This way, when some Mix increases by a relatively 
small amount in an iteration (because some Xj increases 
where My is relatively small), we have only a proportion- 
ally small chance of doing work to update the estimate of 
Mix. Yet the estimates are still accurate in expectation and 
with high probability. 



Sections [2] 12.11 and 12.21 give the main algorithm, prove 
correctness, and bound the worst-case run time, respec- 
tively. Section [3]presents our experimental results, includ- 
ing a comparison with the GLPK Simplex algorithm. 

2. Algorithm and analysis 

In the rest of the paper we assume the primal and dual 
problems are of the following restricted forms, respectively: 
max{|x| : Mx < l,x > 0}, min{|i| : M T x > l,x > 
0}. This is without loss of generality by the transformation 

Slow algorithm. For explanatory purposes, we first give a 
simpler but slow version of the algorithm, essentially a vari- 
ant of Grigoriadis and Khachiyan's algorithm [7 |. For this 
we assume My £ [0, 1] and we don't analyze the running 
time, which can be large. This algorithm demonstrates the 
use of coupled random increments to the primal and dual. It 
returns a (1 — 2e)-approximate primal-dual pair with high 
probability. 



Note that the scaling at the end ensures feasibility. Also, 
| a; | and |i| are always equal, so to prove the approxi- 
mation guarantee we show maxi Mix is not too large in 
comparison to mirij MJx (we want mirij MJx > (1 — 
0(e)) max; Mix at the end). To show this, we show that 
\p\ ■ \p\ (the product of the 1 -norms) is a Lyapunov function 
for the system — that it is non-increasing in expectation. 

Lemma 1 The slow algorithm returns a (1 — 2e)- 
approximate primal-dual pair (feasible primal and dual so- 
lutions x* and x* such that \x*\ > (1 — 2e)|i*|) with prob- 
ability at least 1 — 1/ (re). 

proof: In a given iteration, let p and p denote the vectors 
at the start of the iteration. Let p 1 and p' denote the vectors 
at the end of the iteration. Let Aa; denote the vector whose 
jth entry is the increase in Xj during the iteration (or if z 
is a scalar, Az denotes the increase in z). Then (using that 



AM.i = Mi Ax e [0, 1]) 

\ P '\ = $>(1 + £ ) M * A * 

i 

< ^2 Pi (l + eMiAx) 

i 

= \p\[l + e P T MAx/\p\}. 

Likewise \p'\ < \p\[l - ep T M T Ax/\p\]. 

Multiplying these bounds on \p'\ and \p'\, and using that 

(l + a)(l-6) = \ + a-b-ab< 1 + a - 6 for a, 6 > 0, 

\ P '\\p'\ < \p\\p\[l + e(p/\p\yMAx-EAx T M(p/\p\)]. 

This inequality motivates the "coupling" of primal and dual. 

Taking expectations and plugging in E[Ax] = p/\p\ and 
E[A£] = p/\p\ from the definition of the algorithm, the 
right-hand terms cancel and we get E[|p' | \p'\] < \p\\p\. 

This and Wald's equation (Lemma|9]l imply that the ex- 
pectation of \p\\p\ at termination is at most its initial value 
rc. So, by the Markov bound, with probability at least 
1 — 1/rc, at termination \p\\p\ < (rc) 2 . Assume this hap- 
pens. Then, for all i and j, (1 + e) M * x (l - e) M ^ < (rc) 2 . 

Taking logs gives (1 — e) maxj Mix < min^ MJx + eN. 
(See the proof of Theorem[T]or Lemma [TOlfor details.) 

By the termination condition max^ Mix > N, so this 
implies (1 — 2e) max^ Mix < mirij MjX. 

This and \x\ = \x\ imply the performance guarantee, rj 



Full algorithm. In the remainder of the paper we analyze 
the full algorithm. The algorithm is in Fig. Q] except for 
some implementation details that are left until Section l2~2l 
In comparison to the slow algorithm, the algorithm incor- 
porates the following additional features: 

• It uses non-uniform increments. That is, in each iter- 
ation it increases the randomly chosen xy and by 
some increment Si'j', chosen so that the maximum in- 
crease in any left-hand side (LHS) (i.e. max; AM; a; or 
maxj AMJx) is 6(1). It also deletes covering con- 
straints once they become satisfied. (These basic ideas 
are from @[ll.) 

It adjusts the sampling distributions accordingly 
to maintain that the expected changes still satisfy 
E[Aa;] = ap/\p\ and E[Aa;] = ap/\p\ for an a > 0. 

Implementing this requires maintaining the follow- 
ing data structures: a set J of indices of still-active 
columns (covering constraints); for each column MJ 
the maximum entry uf and for each row Mj, a close 
upper bound Ui on the maximum active entry. 



slow-alg(M e [0, l] rxc ,e) 

1. Vectors x, x <— 0; scalar N = [21n(rc)/e 2 ]. 

2. Repeat until max^ M^a; > N: 

3. Let Pi = (1 + e) M ' x and p = (1 - e) M ^ x . 

4. Choose random indices j' and i' respectively 

from probability distributions p/\p\ and p/\p\. 

5. Increase xy and aV each by 1. 

6. Let (x* , x*) = (xj maxj MiX,x/ mirij Mjx). 

7. Return (x* , x*). 



SOlve(M € IR^ XC , e) — return a (1 — 6e) -approximate primal-dual pair w/ high probability. 

1. Initialize vectors x, x, y, y *— 0, and scalar N — [2 ln(rc)/e 2 ] . 

2. Fix Uj = max{A/jj : i £ [r]} for j £ [c). (This is the largest entry in column j of M.) 

3. The algorithm will increment x and x, maintaining y and y so E[j/] = Mx, E[y] — M T x. 
It will also maintain vectors p defined by p,; = (1 + s) Vi and, as a function of y: 

J = {j £ [c] : jjj < N} (the active columns) 

Ui £ [1, 2] x max{ My : j £ J} (approximates the largest active entry in row i of M) 

• f (i-e)* ifie J 

J 1 otherwise. 

It will also maintain vectors p x u and p x u. 

4. Repeat until maxj yi — N or mirij yj = A^: 

5. Let «— random-pair(p, p, p x x u). 

6. Increase x^/ and i^/ each by the same amount Si/ji = + 

7. Update y, y, and the other vectors as follows: choose random z £ [0, 1] uniformly, and 

8. for each i £ [r] such that MijiSvji > z, increase yi by 1 (and multiply pi and (p x u)j by 1 + e); 

9. for each j £ J such that Mi/jSi'ji > z, increase yj by 1 (and multiply pj and (p x u)j by 1 — e). 

10. For each j leaving J, update J, u, and p X u. 

11. Let (a;*,x*) = [xj maXiMiX, x/mirij Mjx). Return (x*,x*). 



Figure 1 . The full algorithm. Notation: pxu denotes a vector with ith entry pA; [c] denotes {1, 2, . . . , c}. 
Implementation details are in Section EH 



random-pair(p,p, p x u,p x u) 

1. With probability |p x u||p|/(|p x u||p| + |p||p x u\) 

choose random i' from distribution p x u/|p x u\, 
and independently choose j' fromp/|p|, 

2. or, otherwise, 

choose random i' from distribution p/|p|, 

and independently choose j' fromp x u/\p x u|. 

3. Return (i',j r ). 



The increment Syji is taken to be 1/ (u^ + Uji), so that 
when Xj' and are increased by Spy, the maximum 
increase in any LHS (any M^x or Mjx) is in [1/4, 1]. 

• It maintains the vectors Mx and M T x only approxi- 
mately (as y and y), using a simple random sampling. 
For example, if Mix increases by 1/10 in an iteration, 
then the algorithm increases y,; by 1 with probabil- 
ity 1/10. This reduces the total work, yet maintains 
y w Mx and y « M T x with high probability. 

To implement this the algorithm chooses a random 
z £ [0, 1]. It then increments j/j by 1 if the increase in 
Mix is at least z, and increments yi by 1 if the increase 
in MjX is at least z. To do this efficiently, rows and 
columns are first internally sorted (or approximately 
sorted) in decreasing order. 

With these changes, we bound the running time by charging 
the work done to increases in \Mx\ + \M T x\. 

The following subroutine random-pair() chooses the 
random pair of indices (i',f). Any given pair is 
chosen with probability proportional to pif)j(ui + Uj) = 
PiPj/Sij. 



2.1. Correctness 

Theorem 1 With probability at least 1 — 3 /re, the algo- 
rithm in Fig. Q] returns feasible primal and dual solutions 
(x*,x*) with |ar*|/|5*| > 1 - 6e. 

proof: To start we show the algorithm's basic properties. 
Lemma 2 In each iteration, for a — \p\ \p\/ Ylji PiPj /&ij> 

1. The largest change in any relevant LHS is at least 1/4: 
max{maxj AMiX, maxjgj AMJx} £ [1/4, 1]. 

2. The expected changes in various quantities are: 
E[Ax] = ap/|p|, E[Ay] = E[AMx] = aMp/\p\, 
E[Ax] = ap/\p\, E[Ay] = E[AM T x] = aM J p/\p\. 



proof: (1) By the choice of u and u, for the (i',f) chosen, 

max{max Miji Si'ji , max Miu8i>ji } 

i jeJ 

E [1/2, 1]<%/ max{zij/, Uj>} 
C [1/4,1]<%/(?v +ttj/) 
= [1/4,1]. 

(2) First, we verify that the probability that random-pair() 
returns a given is a(pi/|p|)(j3j/|p|)/<5y ■. Here is the 
calculation. The probability is proportional to 



\P x "I \P\ 



PiUj Pj 

\P x u\ \p\ 



\p\ \py-u 



Pi PJ U J 

p \p x u\ 



which by algebra simplifies tDpifij(ui + Uj) — Pipj/Sij. 

Thus, the probability must be a(pi/\p\)(pj/\p\)/5ij, be- 
cause the choice of a makes the sum over all i and j of the 
probabilities equal 1 . 

Next, note that part ( 1 ) of the lemma implies that in line[8] 
(given the chosen i' and j') the probability that a given j/j is 
incremented is MijiSi'ji, while in line|9]the probability that 
a given yj is incremented is Mi>j8i>j'. 

Now the equalities in (2) follow by direct calculation. 
For example: 

E[Axj-] - E l (ap l /\p\)(p 3 /\p\)/S iJ )S l] = ap 3 /\p\. n 

The next lemma is in the same spirit as Lemma [TJ We 
use the coupling in the algorithm to show that the Lyapunov 
function </> = \p\ \p\ is non-increasing in expectation and that 
this and the termination condition imply the approximation 
guarantee (as long as y sa Mx and y w M J x). 

Lemma 3 With probability at least 1 — 1 /rc, when the al- 
gorithm stops, maxi yi < N and minj yj > N(l — 2s). 

proof: Let p' and p' denote p and p after a given iteration, 
while p and p denote the values before the iteration. 

We claim that, givenp and p, E[\p'\ \p'\] < \p\ \p\ — with 
each iteration \p\ \p\ is non-increasing in expectation. 

To prove it, note \p' \ = J2iPi( l + £ Ayi) = \p\+sp T Ay 
and, similarly, \p'\ = \p\ — ep T Ay. 

Multiply the latter two equations and drop a negative 
term to get 

\p'\\p\ < \p\\p\+s\p\p T Ay-e\p\p T Ay. 

The claim follows by applying linearity of expectation, then 
substituting E[Ay] = aMp/\p\ and E[Ay] = aM T p/\p\ 
from Lemma H] 

By Wald's equation (Lemma [9), the claim implies that 
E[|p| \p\] at termination is at most the initial value rc. 

Applying the Markov bound, with probability at least 1— 
1/rc, at termination max^ pi maxj pj < \p\\p\ < (rc) 2 . 



Assume this happens. The index set J is not empty at 
termination, so the minimum yj is achieved for j 6 J. 
Substituting in the definitions of pi and pj and taking logs, 
maxi yi ln(l + e) < mux,- yj ln(l/(l — e)) + 2 ln(rc). 

Divide by ln(l/(l - e)), apply l/ln(l/(l - e)) < 1/e 
and also ba(l + e)/ln(l/(l — e)) > 1 — e. This gives 

(1 — s) max.; yi < mirij yj + 2 ln(rc)/e < minj yj + sN. 

By the termination condition max^ yi < N is guaran- 
teed, and either max^ y^ = N or minj yj — N. So if 
minj jjj = N, then the event in the lemma occurs, and oth- 
erwise maxi yi = N, which (with the inequality in previous 
paragraph) gives (1 — e) N < minj yj + eN. rj 

Next we establish that y w Mx and y w M T x. 

Lemma 4 

(1) For any i, with probability at least 1 — l/(rc) 2 , when 
the algorithm terminates, (1 — e)MiX < 2/i + ^N. 

(2) For any j, with probability at least 1 — l/(rc) 2 , after 
the last iteration with j € J, (1 — e)yj < MJx + eN. 

proof: (1) In each iteration Mix increases by at most 1 
(by the choice of 6i'j>), yi increases by at most 1, and (by 
Lemma|2]l the expected increases in these two quantities are 
the same. So, by the Chernoff bound for random stopping 
times (Lemma [Toll, Pr[(l — e)MiX > yi + eN) is at most 
cxp(— e 2 N) < l/(rc) 2 . This proves (1). 

The proof for (2) is similar, noting that, while j S J, 
MjX increases by at most 1 each iteration. q 

We now prove Theorem [JJ Recall that the algorithm re- 
turns (x*, x*) = (x/ maxj M%x, x/ minj MJx). 

By the naive union bound, with probability at least 1 — 
3/rc the event in Lemma [3] occurs and (for all i and j) the 
events in Lemma|4]occur. 

Assume all of these events happen. 

By algebra, using (1 — a) (1 — b) > 1 — a — b and 
1/(1 +e) > 1 - e, we have (1 - 2e) max. Mix < N 
and minj MJx > (1 - 4e)N. 

This implies minj MJx/ max^ Mix > 1 — 6e. 

Since the sizes |x| and \x\ increase by the same 
amount each iteration, they are equal. Thus, |ar*|/|5*| = 
minj MJx/ max^ Mix > 1 — 6e. q 



2.2. Running time 

In this section we describe fast implementations of the 
algorithm. We assume that the matrix M is given in any 
standard sparse representation (so that the non-zero entries 
can be traversed in time proportional to the number of non- 
zero entries). 



Simpler implementation. We first describe an imple- 
mentation that takes 0(nlogn + (r + c) log(n)/e 2 ) time. 
We then describe how to modify it to remove the log n fac- 
tor from the first term. 

Theorem 2 The algorithm can be implemented to return a 
( 1 — 6e) -approximate primal-dual pair for packing and cov- 
ering in time 0(n log n+(r+c) log(n)/e 2 ) with probability 
at least 1 — 4/rc. 

proof: Use the data structure of |[T2l (see also 0), which 
maintains a vector v and allows random sampling from the 
distribution v/\v\ and updating of entries of v in 0(1) time. 
Maintain such a data structure for each of the vectors p, p, 
py.ii, and p x u. Then random-pair() runs in 0(1) time, 
and each update of an entry of p, p, p x u, or p x u takes 
0(1) time. 

At the start, pre-process the matrix M. Build, for each 
row and column, a doubly linked list of the non-zero en- 
tries. Sort each list in descending order. Cross-reference 
the lists so that, given an entry My in the ith row list, 
the corresponding entry My in the jth column list can be 
found in constant time. The total time for pre-processing is 
0(n log n). 

Maintain the data structures during each iteration as fol- 
lows. Let T t denote the set of indices i for which yi is 
incremented in line [8] in iteration t. From the random 
z € [0, 1] and the sorted j'th row list, compute this setXt by 
traversing the row list, collecting elements until an i with 
My < z/Si'j' is encountered. Then, for each i S Tt, up- 
date yi, pi, and the ith entry in p x u. Likewise, let Jt 
denote the set of indices j for which yj is incremented in 
line [9] Compute Jt from the sorted i'th column list. For 
each j £ J t , update pj, and the jth entry in p x u. The total 
time for these operations during the course of the algorithm 
is 0(£ t l + \l t \ + \J t \). 

For each element j that leaves J, update pj. Delete all 
entries in the jth column list from all row lists. For each row 
list i whose first (largest) entry is deleted, update the corre- 
sponding fit by setting ut to be the next (now first and max- 
imum) entry remaining in the row list; also update (p x 
The total time for this during the course of the algorithm is 
0(n), because each My is deleted at most once. 

This completes the implementation. 

The total time is O(nlogn) plus 0(J2t 1 + \ x t\ + \Jt\)- 
To finish we bound the latter term. 

Lemma 5 

Et fc\ + \Jt\ < (r + c)N = 0((r + c)\og(n)/e 2 ). 

proof: First, £ t \T t \ < rN because each yt can be in- 
creased at most N times before max^ t/, > N (causing ter- 
mination). Second, £. \J t \ < cN because each jjj can be 
increased at most N times before j leaves J. rj 



So the total time is 0(n\ogn + (r + c) log(n)/e 2 ) plus 
0(# t such that \2 t \ + \Jt\ = 0). It remains to bound the 
latter term. Call iteration t empty if |I t | + \J t \ — 0. 

Lemma 6 Given the state at the start of an iteration, the 
probability that it is empty is at most 3/4. 

proof: Given the chosen in the iteration, by (1) of 

Lemma |2] there is either an i such that My/ Syji > 1/4 or 
a j such that Mi'jSi/j' > 1/4. In the former case, i e T t 
with probability at least 1/4. In the latter case, j E J t with 
probability at least 1/4. rj 

Lemma 7 With probability at least 1 — 1 /re, the number of 
empty iterations is 0((r + c)N). 

proof: Let E t be 1 for empty iterations and otherwise. 
By the previous lemma and the Chernoff bound tailored for 
random stopping times (Lemma [Tot, for any S, A > 0, 



Pi- 



ll 



.4 



is at most exp(— 6 A). Taking 5=1/2 and A = 2 ln(rc), it 
follows that with probability at least 1 — 1/ rc, the number of 
empty iterations is bounded by a constant times the number 
of non-empty iterations plus 2 ln(rc). We know the number 
of non-empty iterations is at most (r + c)N, so we conclude 
that with probability at least 1 — 1/rc the number of empty 
iterations is 0((r + c)N). rj 

If the event in Lemma|7]happens, then (since each empty 
iteration takes 0(1) time) the total time is 0(n log n + (r + 
c) log(n)/e 2 ). This and Theorem[T]imply Theorem[2] rj 



Faster implementation. Next we remove the log n factor 
from the n log n term in the running time. The idea is that 
it suffices to approximately sort the row and column lists. 

Theorem 3 The algorithm can be implemented to return a 
( 1 — 7e)-approximate primal-dual pair for packing and cov- 
ering in time 0(n + (c + r) log(n)/e 2 ) with probability at 
least 1 — 5/rc. 

proof: Modify the algorithm as follows. 

First, pre-process M as described in ifTTl §2.1] so that 
the non-zero entries have bounded range. Specifically, let 
[3 = mkymaxi My. Let My = if My < /3e/c and 
My min{(3c/e, Mij} otherwise. As shown in [11 1, any 
(1 — 6e)-approximate primal-dual pair for the transformed 
problem will be a (1 — 7e)-approximate primal-dual pair for 
the original problem. 

In the pre-processing step, instead of sorting the row 
and column lists, pseudo-sort them — sort them based on 



keys [l°g2 These keys will be integers in the range 

log 2 (/3)±log(c/e). Use bucket sort, so that a row or column 
with k entries can be processed in 0(k + log(c/e)) time. 
The total time for pseudo-sorting the rows and columns is 

0(n+(r + c) log(c/e)). 

Then, in the fth iteration, maintain the data structures as 
before, except as follows. 

Compute the set T t as follows. Traverse the pseudo- 
sorted jth column until an index i with Mij/Si'ji < z/2 
is found. (No indices later in the list can be in Tt.) Take all 
the indices i seen with M^idyy > z. Compute the set Jt 
similarly. Total time for this is 0(Et l + KI + 1^/1)' where 
T' t and J[ denote the sets of indices actually traversed (so 
l t C T[ and J t C J[). 

When an index j leaves the set J, delete all entries in the 
jth column list from all row lists. For each row list affected, 
set Ui to two times the first element remaining in the row 
list. This ensures Ui e [1, 2] max^-gj M^. 

These are the only details that are changed. 
The total time is now 0(n + (r + e)log(c/e)) plus 
°(Et 1 + \ J t \ + \Jt !)• To finish we bound the latter term. 

Lemma 8 With probability at least 1 — 2 /rc, 

£ t (l + KI + l#l) = 0((r + c)N). 

proof: Consider a given iteration. Fix i' and j' chosen in 
the iteration. For each i, note that, for the random z 6 [0, 1], 



Pr[iel' t } < Pr[z/2 < MifSi'f] 

< 2Mij'Si'j> 

= 2Vx[z<M lf 5 lir ] 

= 2Pr[iGT t ]. 

Fix an i. Applying Chernoff for random stopping times 
(Lemma [Toll, for any 6, A > 0, 



Pr 



(i-*)E t [*eza > 2Et[*G2i] + A 



is at most exp(— 8 A). (Above [i £ S] denotes 1 if i 6 S 
and otherwise.) 

Taking 5 = 1/2 and A = 41n(rc), with probability at 
least 1 - (rc) 2 , £ t [i € T t ] < 4^Ji € It] + 81n(rc). 

Likewise, for any j, with probability at least 1 — l/(rc) 2 , 
Etb'eJ/] < 2£ t [jej 4 ] + 81n(rc). 

Taking the naive union bound over all i and j, with 
probability at least 1 - 1/rc, ^tO^tl + l^tl) is at most 
4E t (|2*l + l^t|) + 8(r + c)ln(rc). 

By Lemma|5]this is 0((r + c)N). 

We know (Lemma|7]i that the number of empty iterations 
is 0((r+c)N) with probability at least 1 — 1/rc. The lemma 
follows by applying the naive union bound. q 



If the event in the lemma happens, then the total time is 
0(n + (r + c) log(n)/e 2 ). This proves TheoremfJ] q 



3. Empirical results 

We tested our algorithm experimentally and compared its 
running time to that of the GLPK (Gnu Linear Programming 
Kit) Simplex algorithm (glpsol version 4.15 with default 
options). Our first conclusion is that the running time of 
our implementation is well-predicted by the analysis, with 
a leading constant factor of about 12 basic operations in 
the big-O term in which e occurs. Our second conclusion 
is that for large inputs the algorithm can be substantially 
faster than Simplex. For inputs with 2500-5000 rows and 
columns, the algorithm (with e = 0.01) is faster than Sim- 
plex by factors ranging from tens to hundreds. The speedup 
grows roughly linearly in rc. 

We used inputs with r, c e [739,5000], e € 
{0.02,0.01,0.005}, and matrix density d G {l/2 fc : k = 
1,2,3,4,5,6}. For each (r,c,d) tuple that we used, we 
generated a random 0/1 matrix with r rows and c columns, 
where each entry was 1 with probability d. We ran our 
solver for each e and compared its running time to that taken 
by a Simplex solver to find a (1 — e)-approximate solution. 
GLPK failed to finish due to cycling on about 10% of the 
runs we initially tried; we excluded those inputs from our 
experiments. This left 167 runs. The complete data for the 
non-excluded runs is given at the end of the paper. 

The running time of our algorithm includes (A) time 
for pre-processing and initialization, (B) time for sampling 
(line [5] once per iteration of the outer loop), and (C) time 
for increments (lines [8] and [9] once per iteration of the in- 
ner loops). Theoretically the dominant terms are 0{n) for 
(A) and 0((r + c) \og(n)/e 2 ) for (C). For the inputs tested 
here, the significant terms in practice are for (B) and (C), 
with the role of (B) diminishing for larger problems. The 
time (number of basic operations) is well-predicted by the 
expression 



H2(r- 



480d _1 ] ln(rc)/e 2 



(1) 



where d = l/2 k is the density (fraction of matrix entries 
that are non-zero, at least 1/ min(r, c)). 

The 12(r + c) ln(rc)/e 2 term is the time spent in (C), 
the inner loops; it is the most significant term in our experi- 
ments as r and c grow. 

The less significant term 480e?~ 1 ln(rc)/e 2 is for (B), 
and is proportional to the number of samples (that is, it- 
erations of the outer loop). Note that this term decreases 
as matrix density increases. (In our implementation we fo- 
cused on reducing the time for (C), not for (B). It is proba- 
ble that the constant 480 above can be reduced with a more 
careful implementation.) 



The plot below plots the actual running time divided by 
the estimate (Q]), as a function of the estimate, for each of 
the inputs. The x-axis is on a log 10 scale. The plot shows 
that the actual number of basic operations is close to the 
estimate for all the inputs: 



y = (^operations) / (predicted ^operations) 
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During our runs the average time per operation was not 
constant, it grew with larger instances by as much as a fac- 
tor of two. We don't know why. We observed this effect 
across a number of machines. We suspect caching or mem- 
ory allocation issues. The plot below shows the growth in 
average time per operation. 



y = (normalized time per operation) 
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The next plot shows the run time in seconds, divided by 
the predicted time (estimated time per operation times the 
predicted number of operations (fl]i): 
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Next we compared the speed of our algorithm to that 
of the GLPK Simplex algorithm. We estimate the time 



for Simplex to find a near-optimal approximation to be at 
least 5min(r, c)rc basic operations. This estimate comes 
from assuming that at least f2(min(r, c)) pivot steps are re- 
quired (because this many variables will be non-zero in the 
final solution), and each pivot step will take 0(?'c) time. 
(This holds even for sparse matrices due to rapid fill-in, 
although we note that more sophisticated solvers such as 
CPLEX may do a better job of maintaining sparsity.) The 
leading constant 5 comes from our experimental evaluation. 
We note that this estimate seems conservative, and indeed 
GLPK Simplex often exceeds it in our trials. 

Here's a plot of the time for Simplex to find a (1 — e)- 
approximate solution, divided by the conservative estimate 
(5 min(r, c)rc times the estimated time per operation): 
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Combining the above estimates, a conservative estimate 
of the speed-up in using our algorithm (that is, the time for 
Simplex divided by the time for our algorithm) is 



5 min(r, c)rc 



[12(r + c) + 480cT 



(2) 



ln(rc)/e 2 . 

c and e = 0.01 



This is about (r/310) 2 / ln(r) when r 
(for r large). 

The plot below plots the actual measured speed-up di- 
vided by the conservative estimate (O, as a function of the 
estimated running time of our algorithm. 
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The plot shows that the speedup is typically at least as 
predicted in (|2]), and often more. 



4. Implementation issues 



Appendix 



The primary implementation issue is implementing the 
random sampling efficiently and precisely. The data struc- 
tures in |[T2l [SJ, have two practical drawbacks. The con- 
stant factors in the running times are moderately large, and 
they implicitly or explicitly require that the probabilities 
being sampled remain in a polynomially bounded range. 
However, our application uses these data structures in a re- 
stricted way, and we were able to use the underlying ideas 
to build an appropriate data structure with very fast entry- 
update time and moderately fast sample time. We focused 
more on reducing the update time than the sampling time, 
because we expect more update operations than sampling 
operations. Full details of the practical implementation is- 
sues will be provided in a later paper or on request to the 
authors. 

5. Future directions 

Can one extend this approach to mixed packing and cov- 
ering problems, or prove that this is not possible (in a rea- 
sonable model)? What about covering with "box" con- 
straints (upper bounds on individual variables)? Can one 
extend the approach to general packing and covering, e.g. to 
maximum multicommodity flow (where P is the polytope 
whose vertices correspond to all s; — * ti paths)? In all of 
these cases, correctness of a natural algorithm is easy to es- 
tablish, but the running time is an issue. This seems to be 
because the coupling approach requires a symmetry (both 
primal and dual algorithms of a particular kind must exist) 
that the methods based on "optimal" increments do not re- 
quire. 

Can one adapt this algorithm to efficiently solve dy- 
namic problems, or sequences of closely related problems 
(e.g. each problem comes from the previous one by a small 
change in the constraint matrix)? Adapting the algorithm to 
start with a given primal/dual pair seems straightforward. 

Can one use this approach to improve parallel and dis- 
tributed algorithms for packing and covering (e.g. lfTTl[T5l ). 
perhaps reducing the dependence on e from 1/e 4 to 1/e 3 ? 
In this case, instead of incrementing a randomly chosen 
variable in each of the primal and dual solutions, one would 
increment the primal and dual solutions deterministically by 
a fractional vector: incrementing the primal vector x by ap 
and the dual vector x by ap for some a. The correctness 
proof goes through. Can one bound the number of itera- 
tions, assuming the matrix is appropriately preprocessed? 



Utility lemmas 

The first is a one-sided variant of Wald's equation: 

Lemma 9 ([14., lemma 4.1]) Let K be any finite number. 
Let xq,Xi,...,xt be a sequence of random variables, 
where T is a random stopping time with finite expectation. 

IfE[xt — Xt-i | Xt—i] < H and (in every outcome) Xt — 
Xt—i < K for t < T, then Fi[xt — xq] < fx E[T], 

The second is a Chernoff bound tailored for random 
stopping times. 

Lemma 10 Let X = Y^t=i x t an d Y = Y^t=i Vt ^ e 
sums of non-negative random variables, where T is a ran- 
dom stopping time with finite expectation, and, for all t, 
\xt — yt \ < 1 and 

E[sBt-yt | Y,a<t x >iT, a< tV>\ ^ °- 

Let e £ [0, 1] and A&R Then 

Pr [(1 -e)X > Y + A] < exp(-eA). 

proof: Fix A > 0. Consider the sequence 7To , iri , . . . , ttt 
where 7r t = for t > AE[T] and otherwise 

s<t 

= vr f _ 1 (l + £) 3;t (l-e)^ 
< 7T t _i(l + ex t - ey t ) 

(using (1 +e) x (l — e) y < (l+ex — ey) when \x — y\ < 1). 

As E[x t - y t | TTt-i] < 0, we have E[7r< | x t -i] < Tt-i- 

Note that, from the use of A, J2s<t Xs ~ V s anc ' ( mere_ 
fore) nt — iit-i are bounded. Thus Wald's (Lemma[9]l, im- 
plies E[7Tt] < 7T0 = 1. 

Applying the Markov bound, 

Pr[7r T > exp(eA)] < exp(-eA). 

So assume ttt < exp(sA). Taking logs, if T < AE[T], 
Xln(l+e)-Fln(l/(l-e)) = \mr T < sA. 

Dividing by ln(l/(l — e)) and applying the inequalities 
ln(l + e)/ln(l/(l-e)) > l-eande/ln(l/(l-e)) < 1, 
we get (1 - e)X < Y + A. Thus, 

Pi[(l-e)X > Y + A] 

< Pr[T > AE[T]] + Pr[7r T > exp(eA)] 

< 1/A + exp(-eA). 

Since A can be arbitrarily large, the lemma follows. rn 
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The following tables tabulate the details of the experimental 
results described earlier: "t-alg" is the time for our algorithm in 
seconds; "t-sim" is the time for Simplex to find a (1 — e)-optimal 
soln; "t-sim%" is that time divided by the time for Simplex to 
complete; "alg/sim" is t-alg/t-sim. 
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